Title: Primary Healthcare Centres Clinical Notes, Bahrain — 2023
Files:
  - data.csv (UTF-8, quoted CSV; 10 columns; N=… rows)
  - data_dictionary.csv
  - LICENSE.txt
  - README.md (this file)

Provenance: Data originate from the national health information system used by nine public primary healthcare centers in the Kingdom of Bahrain. The data controller extracted encounters from calendar year 2023. Source tables included visit-level metadata and clinician-authored notes. Post-extraction processing comprised: (i) column selection and renaming to a stable schema; (ii) UTF-8 normalization; (iii) whitespace and punctuation normalization of free text; (iv) standardization of service labels to a controlled vocabulary; (v) validation of ICD-10 codes (pattern ^[A-Z][0-9]{2}$); (vi) coercion of encounter_month to YYYY-MM; (vii) top-coding of age_at_encounter at 90+; and (viii) generation of pseudonymous identifiers (patient_id = PHC######, encounter_id = EN#######). De-identification removed direct identifiers (e.g., names, ID numbers, phone numbers, emails, precise addresses, URLs, medical record numbers) using deterministic rules (regular expressions, dictionaries, gazetteers) in combination with a PHI-NER model; dates in text were normalized to month-level precision, and residual full dates were masked. Preparation followed approval by the Research Committee of Primary Healthcare Centers (Bahrain) on November 21, 2024, with a waiver of informed consent for secondary research, and was conducted by the Personal Data Protection Law (PDPL, Law No. 30 of 2018).

Variables (excerpt):
  encounter_id: Randomized encounter key (string)
  patient_id: Randomized pseudonymous ID (string; non-reversible)
  sex_at_birth: {male,female}
  age_at_encounter: Integer years (top-coded at 90+)
  encounter_month: YYYY-MM (string; month-level date)
  care_service: Service line/category (string; controlled list)
  provider_role: Clinician role (string)
  diagnosis_icd10: ICD-10 3 character code (string; e.g., "S09")
  clinical_note: Free-text clinical note (string; de-identified)

How to cite:  Abdulla, Hasan, 2025, "Primary Healthcare Centres Clinical Notes, Bahrain (2023): De-identified Encounters with ICD-10 Annotations", https://doi.org/10.7910/DVN/ZZXKKH, Harvard Dataverse

License / Terms: [CC0]
Contact: [Hasan Abdulla, Primary Healthcare Centtres, habdulla@phc.gov.bh]